Modal Keywords, Ontologies, and Reasoning for Video Understanding
نویسندگان
چکیده
We proposed a novel framework for video content understanding that uses rules constructed from knowledge bases and multimedia ontologies. Our framework consists of an expert system that uses a rule-based engine, domain knowledge, visual detectors (for objects and scenes), and metadata (text from automatic speech recognition, related text, etc.). We introduce the idea of modal keywords, which are keywords that represent perceptual concepts in the following categories: visual (e.g., sky), aural (e.g., scream), olfactory (e.g., vanilla), tactile (e.g., feather), and taste (e.g., candy). A method is presented to automatically classify keywords from speech recognition, queries, or related text into these categories using WordNet and TGM I. For video understanding, the following operations are performed automatically: scene cut detection, automatic speech recognition, feature extraction, and visual detection (e.g., sky, face, indoor). These operation results are used in our system by a rule-based engine that uses context information (e.g., text from speech) to enhance visual detection results. We discuss semi-automatic construction of multimedia ontologies and present experiments in which visual detector outputs are modified by simple rules that use context information available with the video.
منابع مشابه
An Order-Sorted Quantified Modal Logic for Ontological Property Classification
In the field of ontology, several property classifications have been defined as meta-ontologies wherein the properties of individuals are rigorously classified as sortal/non-sortal, rigid/anti-rigid/non-rigid, etc. through philosophical analysis. The notions ofsuch meta-ontologies enable the processing of properties that allow us to reason abouttaxonomic knowledge in information...
متن کاملHigher-Order Modal Logics: Automation and Applications
These are the lecture notes of a tutorial on higher-order modal logics held at the 11th Reasoning Web Summer School. After defining the syntax and (possible worlds) semantics of some higherorder modal logics, we show that they can be embedded into classical higher-order logic by systematically lifting the types of propositions, making them depend on a new atomic type for possible worlds. This a...
متن کاملNon Omniscient Intensional Contextual Reasoning for Query-agents in P2P Systems
Given a de-centralized nature of the development of the Semantic Web, there will be an explosion in the number of ontologies in P2P systems. To integrate data from disparate ontologies, we must know the semantic correspondence between their elements. What we argue is the full epistemic independency of peer databases: they can change their ontology and/or extension of their knowledge independent...
متن کاملReview of Image Annotation for the Evaluation of Computer Vision Algorithms
In the field of computer vision, automated image annotation and object recognition are currently important research topics. It is hoped that these will lead to improved general image understanding which can be usefully applied in Content-based Image Retrieval. Three approaches to image annotation are reviewed: free text annotation, keyword annotation and annotation based on ontologies. An analy...
متن کاملExtracting Salient Keywords from Instructional Videos Using Joint Text, Audio and Visual Cues
This paper presents a multi-modal featurebased system for extracting salient keywords from transcripts of instructional videos. Specifically, we propose to extract domain-specific keywords for videos by integrating various cues from linguistic and statistical knowledge, as well as derived sound classes and characteristic visual content types. The acquisition of such salient keywords will facili...
متن کامل